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Abstract-  A  multilayer  neural  network  has  been 
developed  that  consists  of  slabs  of  single  neuron  models. 
Each  slab  is  composed  of  a  single  type  of  neurons, 
which  differs  between  the  slabs.  The  network  was 
trained  using  a  biologically  inspired,  Hebbian-like, 
learning  rule  on  EMG  data  and  good  training/testing 
classification  performance  was  obtained.  It  was  shown 
that  the  biologically  inspired  network,  the  novel 
architecture  of  which  is  derived  from  the  functionally 
distinct  hypercolumns  of  neurons  in  the  brain,  can  be 
successfully  applied  on  difficult  classification  tasks. 
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I.  Introduction 

Neural  tissue  has  the  ability  to  learn  and  memorize  a 
vast  amount  of  information  over  very  short  time  scales 
with  little  exposure  to  the  learning  data.  The  mechanisms 
underlying  learning  and  memory  processes  in  the  brain  are 
only  partially  identified  to  this  date.  Biophysical  and 
computational  studies  indicate  that  information  is  learned 
through  a  process  of  structural  modifications  that  take 
place  in  synaptic  terminals  between  neurons,  in  response  to 
the  frequency  of  activation  [1-3].  This  process  is  believed 
to  be  guided  by  locally  available  signals  at  synaptic  sites 
(Hebbian  learning  rules)  while  it  could  also  be  partially 
stochastic  and  partially  regulated  by  backpropagating 
messages  sent  from  the  cell  body  to  the  modifiable  sites  [4- 
6],  Previous  computational  work  has  shown  that  such  a 
mechanism  could  significantly  boost  the  memory  capacity 
of  biological  neurons  [7-9],  In  this  work,  a  supervised 
Hebbian-Stochastic  learning  rule  is  implemented  in  a  novel 
neural  network  model  and  the  performance  is  compared  to 
existing  rules.  The  architecture  of  the  proposed  model  is 
also  inspired  from  the  morphological  and  biophysical 
properties  of  neurons  in  the  brain.  Specifically,  the  model 
is  composed  of  three  layers  where  the  neurons  in  the 
hidden  layer  are  divided  in  two  distinct  categories  (slabs). 
Each  slab  is  made  of  identical  single  neuron  models  but 
neurons  between  the  slabs  are  different.  This  differentiation 
in  the  neuronal  processing  properties,  which  in  the  neural 
network  case  is  depicted  by  using  different  single  neuron 
models,  has  been  observed  in  the  primary  visual  cortex 
(VI).  In  the  1960’s,  Hubei  and  Wiesel  described  cells  in 
VI  which  responded  to  the  image  of  oriented  bars  and 


edges  [10,  11],  The  most  basic  cell  type,  which  they 
described,  responded  to  a  stationary,  spatially  localized 
oriented  contour  and  this  cell  was  dubbed  the  simple  cell. 
Hubei  and  Wiesel  also  reported  a  striking  regularity  in  the 
organization  of  the  cells  in  V 1  based  upon  cortical  columns 
(or  hypercolumns)  running  tangentially  to  the  cortical 
surface.  They  showed  that  neurons  selective  to  a 
continuous  range  of  different  orientations  were  grouped 
together  in  one  hypercolumn  [11,  12].  This  functional 
differentiation  has  been  suggested  to  enhance  learning  by 
promoting  decorrelation  of  the  information  learned  by 
different  parts  of  the  physical  network.  Based  on  this 
evidence,  the  present  work  investigates  the  classification 
performance  of  a  novel  neural  network  architecture,  where 
the  hidden  layer  of  a  three-layer  network  is  divided  into 
sets  of  different  single  neuron  models. 
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Figure  1. 

The  proposed  neural  network  is  composed  of  three  layers:  an  input  layer, 
two  hidden  slabs  with  dissimilar  neurons  and  an  output  layer.  The  size  of 
the  network  in  the  present  study  is  24  input  neurons,  10  hidden  neurons  (5 
per  slab)  and  three  output  neurons. 
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II.  METHODOLOGY 


Previous  work  has  shown  that  a  multi-slab  architecture 
similar  to  the  one  shown  in  figure  1.,  can  yield  high 
classification  scores  on  a  variety  of  application  [13].  Based 
on  these  earlier  models,  we  develop  a  three  layer 
feedforward  neural  network  which  consists  of:  (1)  an  input 
layer  where  input  features  are  pre-processed  with  the  use  of 
adaptable  Gaussian  activation  functions  (Receptive  Fields), 
(2)  two  slabs  of  hidden  neurons  where  single  neuron 
models  within  each  slab  are  the  same  but  differ  between 
slabs  and  (3)  an  output  neuron  where  the  outputs  for  each 
slab  are  linearly  combined  before  passed  through  an 
adaptable  Logistic  Activation  Function  (LAF).  The 
Receptive  Field  (RF)  equation  for  each  distinct  input 
feature  k  is  given  by: 

fk(xk)  =  ak-exp(-sk  ■  ( xk-Hk)nk )  (1) 

where  xk  is  the  kth  input  feature  of  training  pattern  x.  Thus, 
the  number  of  RFs  in  the  input  layer  is  equal  to  the  number 
of  input  features  in  the  learning  set.  The  parameters  for 
each  RF  (ak,  sk,  Hk,  nk)  as  well  as  the  LAF  parameters  (BJ} 
Cj )  shown  in  equation  6  are  modified  during  learning  by 
±10%  random  changes  in  their  magnitude.  Changes  are 
only  kept  if  they  lead  to  lower  MSE.  Receptive  Fields  in 
the  network  are  similar  to  spatial  receptive  fields  of 
neurons  in  the  LGN  or  the  retina,  where  visual  information 
is  first  processed  in  the  brain  [10,  14].  The  receptive  fields 
of  these  neurons  have  been  shown  to  have  a  Gaussian-like 
(center-surround)  shape  where  stimuli  that  lie  in  the  center 
of  the  receptive  field  (RF)  excite  the  cell  while  stimuli  in 
the  periphery  of  the  RF  do  not  cause  neuron  firing.  The 
idea  behind  this  adaptable  design  is  to  allow  each  input 
neuron  to  focus  on  a  specific  subset  of  the  input  data  thus 
promoting  the  decorrelation  of  patterns  learned  by  different 
parts  of  the  network,  a  method  that  has  been  shown  to 
enhance  information  learning  in  the  brain  [15,  16].  The 
hidden  layer  of  the  network  consists  of  dissimilar  single 
neuron  models  divided  in  two  slabs,  where  neurons  within 
each  slab  are  identical.  The  two  types  of  single  neuron 
models  are  given  by  the  equations: 


Oti (p)  =  0  if  p  <  0  (3) 

OCi (p)  =1  if  p  >  0 

for  the  first  slab  and 

a  2(p)  =  p  10  (4) 

for  the  second  slab,  p  is  the  weighted  sum  of  the  post- 
processed  input  pattern/ =  [/] (x, )/2(x2)  .../N(xN)]: 

Pif)  =  L  w  k,iL-fk(xk) 


where  wkiL  is  the  weight  between  Receptive  Field  k  and 
neuron  i  in  slab  L  =  1,2.  A  power  of  ten  was  selected  for 
neurons  in  slab  2  based  on  the  results  of  an  earlier  work 
where  morphologically  realistic  neurons  were  used  on 
classification  tasks  [9].  The  use  of  dissimilar  neuron 
models  is  another  means  of  promoting  the  decorrelation  of 
information  learned  by  the  two  slabs  of  the  network.  The 
outputs  of  both  slabs  were  linearly  combined  and  fed  to  the 
third  layer  of  the  network.  The  activation  functions  for  the  j 
=  1 ,  ...  J  neurons  in  the  output  layer  are  given  by: 


Yj(Aj)  =  B/(l+exp(-CyAj))  (6) 

with  A  =  [Ai  A2  ...  An]  the  matrix  of  the  combined  slab 
output: 

A  =  [W,1,  Wo  ]  •  [a,  a2]T  (7) 

and  Wi'j,  W;2j  equal  to  the  weights  between  the  hidden 
layers  and  the  output  layer  of  the  network. 

Hebbian  Annealing  Rule  (HAR):  A  biologically  inspired 
learning  rule  was  developed  which  is  compared  to  a 
random  modifications  rule  used  in  previous  studies  [17]. 
Learning  in  the  proposed  network  proceeds  as  follows:  (1) 
weights  and  activation  function  variables  are  initialized  at 
random  (2)  at  each  step,  a  neuron  (in  each  slab  and  output 
layer)  is  selected  at  random  and  three  weights  from  the 
weight  matrix  associated  with  this  neuron  are  randomly 
selected.  A  score  is  calculated  for  each  weight  as  shown  by 
the  equation: 

Sw  =  cor  ( MJI),  M0JI)  )  (8) 

where  the  function  cor (a,b)  calculates  the  correlation 
coefficient  between  vectors  a  and  b ,  Min  is  the  row  of  the 
input  matrix  corresponding  to  the  selected  weight  w  (or 
W),  Mout  is  the  corresponding  row  of  the  output  matrix  for 
the  selected  neuron,  and  /  is  an  index  used  to  identify  all 
misclassified  patterns.  Thus,  Min(I)  and  M„ut(I)  indicate 
that  the  correlation  is  measured  only  over  the  patterns  that 
were  misclassified  in  the  specific  run.  This  method  was 
previously  shown  to  lead  in  faster  convergence  of  the 
algorithm  [9],  The  score  for  each  weight  in  the  selected 
pool  is  calculated  and  the  weight  with  the  smaller  score  is 
targeted  for  modification.  The  modification  is  a  ±10% 
change  in  its  magnitude  and  the  change  is  kept  or  rejected 
based  on  a  Boltzman  equation: 


(5) 


R  =  1/(1  +  exp((E_new-E_old)/T)) 


(9) 


A  change  is  kept  if  E_new  <  E_old  or  a  random  number  a 
e  [0,1]  is  smaller  than  the  result  of  equation  (9).  A 
temperature  variable  (T)  is  lowered  by  a  scaling  factor  (Tf) 
over  the  course  of  learning  such  than  fewer  bad  changes 
are  accepted  as  the  algorithm  converges  to  a  minimum.  To 
avoid  local  minima,  an  additional  criterion  was 
incorporated  in  the  learning  mle.  If  the  error  rate  was 
unchanged  for  300  consecutive  iterations,  T  was  increased 
by  a  factor  of  sqrt(Tf).  For  the  experiments  reported  here, 
the  initial  Temperature  T=20  and  the  temperature  reduction 
factor  Tf  =  0.95.  Learning  was  terminated  after  a  maximum 
number  of  iterations  (20,000)  or  when  no  further 
improvement  in  the  MSE  was  observed  after  1000 
consecutive  iterations. 


EMG  DATA 


Two  network  architectures  were  trained  with  both  the  Hebbian- Annealing 
(HA)  and  the  Random  Modification  (RM)  mle.  The  first  network 
consisted  of  identical  neurons  in  both  hidden  layers  (dashed  and  dashed- 
dotted  lines:  essentially  linear  neurons  with  a  step  activation  function 
shown  in  (3))  and  the  second  network  is  depicted  in  figure  1  (solid  and 
dotted  lines).  We  show  that  combination  of  HA  mle  and  dissimilar  neuron 
network  results  in  significantly  better  performance  than  the  RM  rule. 
Moreover,  the  proposed  architecture/learning  rule  combination  has  the 
faster  convergence  than  all  cases  tested. 


HI.  RESULTS 

The  neural  network  model  was  used  for  the  classification 
of  Electromyography  data  and  its  performance  was 
compared  to  a  learning  rule  with  random  parameter 
modifications  (RM)  versus  the  Hebbian  Annealing  (HA) 
rule.  The  RM  mle  consisted  of  ±10%  random 
modifications  at  weight,  RF  and  activation  function 
parameters,  alternatively,  and  the  changes  were  kept  only  if 
they  resulted  in  lower  MSE  rate.  The  above  rule  was 
selected  for  comparison  since  it  was  previously  shown  to 
be  superior  to  both  backpropagation  and  simulated 
annealing  rules  in  a  neural  network  with  similar 
architecture  [17]. 

MATERIAL:  Motor  unit  action  potentials  (MUAPs) 
recorded  during  routine  electromyographic  (EMG) 
examination  provide  important  information  for  the 
assessment  of  neuromuscular  disorders.  In  this  study  we 
use  a  neural  network  model  to  analyse  MUAPs  using  the 
mean  and  standard  deviation  of  seven  time  domain 
parameters:  duration,  spike  duration,  amplitude,  area,  spike 
area,  number  of  phases  and  number  of  turns  [18].  A  total  of 
480  MUAPs  obtained  from  24  subjects,  8  NOR  (healthy),  8 
MYO  (myopathy)  and  8  MND  (motor  neuron  disease), 
were  used  for  training  the  ANN  classifiers,  whereas  a  total 
of  200  MUAPs,  obtained  from  10  subjects,  4  NOR,  3 
MYO  and  3  MND  were  used  for  evaluation. 

Due  to  the  rather  limited  amount  of  the  input  data  and  in 
order  to  verify  the  correctness  of  the  classification  results  a 
bootstrapping  procedure  was  used.  The  system  was  trained 
and  evaluated  using  five  different  bootstrap  sets  where  in 
each  set  24  different  subjects  were  selected  at  random  for 
training  and  10  different  subjects  for  evaluation.  A 
representative  error  curve  for  both  rules  is  shown  in  figure 
2.  The  convergence  rate  of  the  biological  rule  is 
significantly  faster  than  the  corresponding  random  rule  as 
shown  in  the  graph.  Finally,  in  addition  to  the  MSE 
minimization  advantage,  the  classification  performance  of 
the  Hebbian  Annealing  rule  is  also  better.  The  mean 
percentage  and  the  standard  deviation  (Std)  of  the  correct 
classifications  score,  i.e.  diagnostic  yield,  for  the  five 
bootstrap  sets  for  each  rule  is  shown  in  Table  1. 

Table  1. 


Trailing 
(%  correct) 

Evaluation 
(%  correct) 

Random  Changes 
Learning  Rule 

75 .23%+/-  10n 

60.5%+/-  15% 

He  bb  ian-  Anne  a  ling 
Learning  Rule 

93:29%+/  2% 

86%+/- 5.4% 

Training  and  Evaluation  results  for  the  two  learning  rules  tested  using  the 
model  architecture  shown  in  Figure  1 .  Both  the  Training  and  Evaluation 
performance  of  the  network  trained  with  the  Hebbian-Annealing  rule  are 
significantly  better  than  the  corresponding  Random  Modifications  rule.  In 
both  cases,  the  model  was  trained  until  no  further  improvement  in  the 
MSE  was  observed  after  1000  consecutive  iterations. 


IV.  Discussion 

A  novel  neural  network  model  was  developed,  the 
architecture  of  which  combines  the  integrative  properties 
of  biological  neurons  with  a  neurally  inspired  learning  rule. 
The  three  layer  neural  network  consisted  of  two  parallel 
hidden  units,  each  composed  of  a  distinct  single  neuron 
model  type.  A  similar  network  architecture  was  first 
implemented  by  [17]  and  its  benefits  have  yet  to  be 
explored  fully.  The  present  work  has  shown  that  such 
artificial  neural  networks,  which  are  supported  by 
physiological  evidence  in  the  visual  system,  can  be 
successfully  used  on  difficult  memory/recognition  tasks. 
The  proposed  model  was  trained  with  a  supervised 
Hebbian-annealing  learning  rule,  and  was  found  to  give 
good  results  on  the  classification  problem  tested.  It  was 
shown  that  both  the  model  architecture  and  learning  rule 
are  responsible  for  improved  performance.  The  artificial 
neural  network  was  found  to  perform  significantly  better 
that  a  same-sized  MLP  network  trained  with  either  a 
random  change  rule  or  the  biological  rule.  Furthermore, 
the  proposed  network  performed  at  it’s  best  when  trained 
with  the  biological  learning  rule.  Thus,  it  can  be  concluded 
that  learning  mechanisms  employed  by  the  brain  can  be 
successfully  used  in  artificial  learning  tasks  and  possibly 
even  outperform  existing  algorithms. 
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